MFI-TransSW+: Efficiently Mining Frequent Itemsets in Clickstreams

نویسندگان

  • Franklin A. de Amorim
  • Bernardo Pereira Nunes
  • Giseli Rabello Lopes
  • Marco A. Casanova
چکیده

Data stream mining is the process of extracting knowledge from massive real-time sequence of data items arriving at a very high data rate. It has several practical applications, such as user behavior analysis, software testing and market research. However, the large amount of data generated may offer challenges to process and analyze data at nearly real time. In this paper, we first present the MFI-TransSW+ algorithm, an optimized version of MFI-TransSW algorithm that efficiently processes clickstreams, that is, data streams where the data items are the pages of a Web site. Then, we outline the implementation of a news articles recommendation system, called ClickRec, to demonstrate the efficiency and applicability of the proposed algorithm. Finally, we describe experiments, conducted with real world data, which show that MFI-TransSW+ outperforms the original algorithm, being up to two orders of magnitude faster when processing clickstreams.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining frequent itemsets over data streams using efficient window sliding techniques

Online mining of frequent itemsets over a stream sliding window is one of the most important problems in stream data mining with broad applications. It is also a difficult issue since the streaming data possess some challenging characteristics, such as unknown or unbound size, possibly a very fast arrival rate, inability to backtrack over previously arrived transactions, and a lack of system co...

متن کامل

SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets

Maximal frequent itemsets (MFI) are crucial to many tasks in data mining. Since the MaxMiner algorithm first introduced enumeration trees for mining MFI in 1998, there have been several methods proposed to use depth first search to improve performance. To further improve the performance of mining MFI, we proposed a technique to gather and pass tail (of a node) information to determine the next ...

متن کامل

CLAIM: An Efficient Method for Relaxed Frequent Closed Itemsets Mining over Stream Data

Recently, frequent itemsets mining over data streams attracted much attention. However, mining closed itemsets from data stream has not been well addressed. The main difficulty lies in its high complexity of maintenance aroused by the exact model definition of closed itemsets and the dynamic changing of data streams. In data stream scenario, it is sufficient to mining only approximated frequent...

متن کامل

Discovering Maximal Frequent Item set using Association Array and Depth First Search Procedure with Effective Pruning Mechanisms

The first step of association rule mining is finding out all frequent itemsets. Generation of reliable association rules are based on all frequent itemsets found in the first step. Obtaining all frequent itemsets in a large database leads the overall performance in the association rule mining. In this paper, an efficient method for discovering the maximal frequent itemsets is proposed. This met...

متن کامل

Fast Algorithms for Mining Generalized Frequent Patterns of Generalized Association Rules

Mining generalized frequent patterns of generalized association rules is an important process in knowledge discovery system. In this paper, we propose a new approach for efficiently mining all frequent patterns using a novel set enumeration algorithm with two types of constraints on two generalized itemset relationships, called subset-superset and ancestor-descendant constraints. We also show a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016